A Robust Framework for Web Information Extraction and Retrieval
نویسندگان
چکیده
منابع مشابه
A Data Extraction and Visualization Framework for Information Retrieval Systems
In recent years we are witnessing a continuous growth in the amount of data that both public and private organizations collect and profit by. Search engines are the most common tools used to retrieve information, and more recently, clustering techniques showed to be an effective tool in helping users to skim query results. The majority of the systems proposed to manage information, provide text...
متن کاملA Framework for Automatic Document Understanding for Web Information Retrieval
Most of the web search engines use keyword based approach to search for needed information on the web. When a query is submitted by the user to the search engine, the web crawler tries to match the keywords with name of file, URL or the meta tags of the documents. Because of this, user may get many non-relevant documents along with relevant documents. It can lead to the frustration of informati...
متن کاملPolyUHK: A Robust Information Extraction System for Web Personal Names
Personal information extraction is an important component of advanced information retrieval. There are two problems needed to be solved in this practical task: personal name ambiguity and extraction of personal information for a specific person. For personal name ambiguity, which is a very common phenomenon in the fast growing Web resource, we propose a robust system which extracts features wit...
متن کاملOCR++: A Robust Framework For Information Extraction from Scholarly Articles
This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written ...
متن کاملA Framework for Decentralized Ranking in Web Information Retrieval
Search engines are among the most important applications or services on the web. Most existing successful search engines use global ranking algorithms to generate the ranking of documents crawled in their databases. However, global ranking of documents has two potential problems: high computation cost and potentially poor rankings. Both of the problems are related to the centralized computation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Machine Learning and Computing
سال: 2014
ISSN: 2010-3700
DOI: 10.7763/ijmlc.2014.v4.403